Skip to content

Add a formal semver 2.0.0 version type #371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 58 commits into
base: feature-PR371-semver2.0
Choose a base branch
from

Conversation

darakian
Copy link

@darakian darakian commented Dec 9, 2024

First crack at adding a formal version type in response to #362 (comment) Any others which are agreed upon should be spun up in their own PRs so that conversations in the PRs can be kept on topic

Happy to expand this if people think the full semver spec should be in this repo as well. I went back and forth on that.

Another thought is that maybe this should be a retroactive definition of the semver type. That would likely be breaking for some of the current records though.

The goal here is to have strict validation provided by cve services

First crack at adding a formal version type in response to
CVEProject#362 (comment)
Any others which are agreed upon should be spun up in their own PRs so that conversations in the PRs can be kept on topic

Happy to expand this if people think the full semver spec should be in this repo as well. I went back and forth on that.
@sei-vsarvepalli
Copy link
Contributor

I recommend you resubmit the PR with a change in both schema/docs/CVE_Record_Format_bundled_adpContainer.json and schema/docs/CVE_Record_Format_bundled_cnaContainer.json focusing on the version field. This PR with change to just example.md will not be useful without a schema based validation, as example.md is only a human friendly markdown.

It will be best to target a JSON schema validation instead of programmatically verifying versions when they are specific like this scenario with a clear semver-2.0.0 compliance being tested.

Secondly, we should follow/extend the current schema model and extend it to satisfy this need instead of a completely new JSON schema fields like exclusiveUpperBound - it is not really as initiative as lessThan

See the current versions.md document which has some examples

https://github.com/CVEProject/cve-schema/blob/main/schema/docs/versions.md

{
  "version": "2.0.0",
  "versionType": "semver",
  "lessThanOrEqual": "2.5.1",
  "status": "affected"
}

The one we don't current have is the exclusiveLowerBound that you mention. However the other examples can be mapped according to the current schema. Potentially we can add as greaterThan boolean field which when present the version field should be treated as ">" instead of ">=" which is the current default "version" field.

So your Example will actually look like

            {
               "versionType": "semver-2.0.0",
               "version": "1.2.3-alpha",
               "lessThan": "2.3.4+build17"
             }
             {
               "versionType": "semver-2.0.0",
               "version": "3.4.5-beta",
               "greaterThan": true,
               "lessThanOrEqual": "4.5.6+assembly88"
             }
             {
               "versionType": "semver-2.0.0",
               "version": "5.6.7-gamma",
             }
             {
               "versionType": "semver-2.0.0",
               "version": "6.7.8-delta",
             }

You need to build a JSON schema validator to work with such data, with versionType frozen with enum as semver-2.0.0 and valid regex to "version", "lessThanOrEqual" and "lessThan" fields require regex validator
/^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$/
Finally provide the additional "greaterThan" boolean field perhaps that will treat version as ">" instead of ">=".

@darakian
Copy link
Author

darakian commented Feb 13, 2025

Thank for the comment and I can update the json in this PR once we get to consensus 👍

With respect to the range fields themselves, after seeing you rewrite my example I think it makes sense to simplify and create new fields so that a parser doesn't need to implement conditional logic based on the combination of fields present. I think this will make for simpler and more maintainable code long term. Maybe more people can chime in on this point.

As for the regex it looks like the one you're suggesting is the second of the two provided on semver.org. Albeit with a leading and trailing /.

For documentation's sake here are the two

One with named groups for those systems that support them (PCRE [Perl Compatible Regular Expressions, i.e. Perl, PHP and R], Python and Go).

^(?P<major>0|[1-9]\d*)\.(?P<minor>0|[1-9]\d*)\.(?P<patch>0|[1-9]\d*)(?:-(?P<prerelease>(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+(?P<buildmetadata>[0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$
and

one with numbered capture groups instead (so cg1 = major, cg2 = minor, cg3 = patch, cg4 = prerelease and cg5 = buildmetadata) that is compatible with ECMA Script (JavaScript), PCRE (Perl Compatible Regular Expressions, i.e. Perl, PHP and R), Python and Go.

^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$

…for the expressions of "everything under X" or "everything over Y"
@darakian
Copy link
Author

Had a thought hit me about one sided ranges, so I added two more examples

            {
              "versionType": "semver-2.0.0",
              "exclusiveUpperBound": "1.0.0",
            }
            {
              "versionType": "semver-2.0.0",
              "inclusiveLowerBound": "9.0.0",
            }

Which allow someone to express the idea of everything under X or everything over Y. The former of those two is reasonably common.

…-02-20. The status conversation will happen another day
@darakian
Copy link
Author

So your Example will actually look like
...

           {
             "versionType": "semver-2.0.0",
             "version": "3.4.5-beta",
             "greaterThan": true,
             "lessThanOrEqual": "4.5.6+assembly88"
           }

@sei-vsarvepalli where does the greaterThan parameter come from? I'm prepping my comparison of the two representations for thursday's QWG and I can't find a reference to this parameter in the docs. Searching the repo for the string brings back only this PR
https://github.com/search?q=repo%3ACVEProject%2Fcve-schema%20greaterThan&type=code
Am I missing something?

@sei-vsarvepalli
Copy link
Contributor

So your Example will actually look like
...

           {
             "versionType": "semver-2.0.0",
             "version": "3.4.5-beta",
             "greaterThan": true,
             "lessThanOrEqual": "4.5.6+assembly88"
           }

@sei-vsarvepalli where does the greaterThan parameter come from? I'm prepping my comparison of the two representations for thursday's QWG and I can't find a reference to this parameter in the docs. Searching the repo for the string brings back only this PR https://github.com/search?q=repo%3ACVEProject%2Fcve-schema%20greaterThan&type=code Am I missing something?

The field greaterThan does not exit today. It could be an option if you want to maintain the other fields as-is and then add something without having to recreate a new field. Appending a not-required field greaterThan is a non-breaking change, allowing other versionType fields to adopt something similar as we move towards enforcing stricter schema checks.

@darakian
Copy link
Author

darakian commented Feb 27, 2025

Gotcha. Then I guess the difference between the two approaches in schema terms is to add a greaterThan parameter vs adding the inclusiveLowerBound, exclusiveLowerBound, inclusiveUpperBound, exclusiveUpperBound, and exactly parameters.

I've written a pretty simple parser in python for my proposal. It assumes perfect data (validated) and that the data is semver-2.0.0, but I think it gets the point across on the simplicity of parsing. Feel free to play around with it as well by changing the specific parameters in the test. I think I covered all the cases and it can probably be simplified further.

import json

test_json_string = """
{
    "versionType": "semver-2.0.0", 
    "status": "affected", 
    "exclusiveLowerBound": "1.2.3-alpha",
    "inclusiveUpperBound": "2.3.4+build17"
    }
"""

def parse_decoded_json(json):
	if json.get("exactly"):
		return f'= {json.get("exactly")}'

	if json.get("inclusiveLowerBound"):
		lower = f'{">= "+json.get("inclusiveLowerBound")}'
	elif json.get("exclusiveLowerBound"):
		lower = f'{"> "+json.get("exclusiveLowerBound")}'
	else:
		lower = ""

	if json.get("inclusiveUpperBound"):
		upper = f'{"<= "+json.get("inclusiveUpperBound")}'
	elif json.get("exclusiveUpperBound"):
		upper = f'{"< "+json.get("exclusiveUpperBound")}'
	else:
		upper = ""

	return f'{lower}, {upper}'

the_json = json.loads(test_json_string)
print(parse_decoded_json(the_json))

I initially had

lower = f'{">= "+json.get("inclusiveLowerBound") if json.get("inclusiveLowerBound") else "> "+ json.get("exclusiveLowerBound")}'
upper = f'{"<= "+json.get("inclusiveUpperBound") if json.get("inclusiveUpperBound") else "< "+ json.get("exclusiveUpperBound")}'

However that doesn't handled one sided ranges and I wanted to get some code up before today's qwg meeting. I also haven't had time to make a complete comparison parser, but translating the section

if json.get("exactly"):
	return f'= {json.get("exactly")}'

results in something that needs to look like

if json.get("version") and (not json.get("lessThan") or not json.get("greaterThan") or not json.get("lessThanOrEqual")):
		return f'= {json.get("version")}'

as the code needs to be sure that the parameter version stands alone. Having a new parameter with a single function simplifies that logic.

@darakian darakian changed the base branch from main to feature-PR371-semver2.0 February 28, 2025 17:17
@darakian
Copy link
Author

darakian commented Mar 5, 2025

@sei-vsarvepalli the new properties are in as of commit 62db169, however I'm not sure how to express the valid combinations of parameters for the semver 2.0.0 version type. Do I need to do something like a oneOf for the versions block itself? eg.

"versions": {
                    "oneOf": [
                    "type": "array",
                    "description": "Set of product versions or version ranges related to the vulnerability. The versions satisfy the CNA Rules [8.1.2 requirement](https://cve.mitre.org/cve/cna/rules.html#section_8-1_cve_entry_information_requirements). Versions or defaultStatus may be omitted, but not both.",
                    "minItems": 1,
                    "uniqueItems": true,
                    "items": {
                        "type": "object",
                        ...

Where the first option in the one of is the entire current payload and the other is the semver 2.0.0? Maybe you know a simpler approach?

If this is valid then still need to ensure version type is set to semver-2.0.0 for these combinations
@darakian
Copy link
Author

darakian commented Mar 6, 2025

I let this stew for a bit and I think 046dadd is in the right direction. I think its possible to only allow those parameter combinations when the version type is semver 2.0.0, but not sure how to encode that yet.

@darakian
Copy link
Author

darakian commented Mar 12, 2025

@sei-vsarvepalli Ok, so I'm trying to run the tests locally and it seems I need to rebuild dist/cve5validator.js. When attempting to do so though I get Error: Cannot find module '../../docs/CVE_JSON_bundled.json'. It looks like that file got renamed here
a3babe8

However that file doesn't seem to reference the CVE schema file that I've been making edits to, so I'm a little confused how this all works for local testing. Am I missing something basic here? Am I editing the wrong file?

@sei-vsarvepalli
Copy link
Contributor

sei-vsarvepalli commented Mar 12, 2025

@sei-vsarvepalli Ok, som I'm trying to run the tests locally and it seems I need to rebuild dist/cve5validator.js. When attempting to do so though I get Error: Cannot find module '../../docs/CVE_JSON_bundled.json'. It looks like that file got renamed here a3babe8

However that file doesn't seem to reference the CVE schema file that I've been making edits to, so I'm a little confused how this all works for local testing. Am I missing something basic here? Am I editing the wrong file?

What tests are you running? It looks like the starting point of your repo is main which has diverged quite a bit too. Perhaps start with either the develop branch or feature-144-SSVC seem more that target for 5.2.0 where this semver update is expected to be bundled.

Your JSON file is also mangled, the line 323 is missing a comma. When I run test against your branch I get this error

ParserError: Error parsing ./cve-schema/schema/cve-schema.json: missed comma between flow collection entries (324:29)

 321 |                                     {"required": ["exclusiv ...
 322 |                                 ]
 323 |                             }
 324 |                             {
-----------------------------------^

@darakian
Copy link
Author

Thanks for pointing out the comma. Added that in.

I'm trying to run the node validation suite with node validate.js ../tests/valid/semver2-0-0.json. The test is running fwiw. I'm getting the following

jon~/g/!/c/s/s/Node_Validator:add-semver-2.0.0-versionType❯❯❯ node validate.js ../tests/valid/semver2-0-0.json
../tests/valid/semver2-0-0.json is invalid:
[
  {
    instancePath: '/containers/cna/affected/0/versions/0',
    schemaPath: '#/properties/versions/items/oneOf/0/maxProperties',
    keyword: 'maxProperties',
    params: { limit: 2 },
    message: 'must NOT have more than 2 properties'
  },
  {
    instancePath: '/containers/cna/affected/0/versions/0',
    schemaPath: '#/properties/versions/items/oneOf/1/required',
    keyword: 'required',
    params: { missingProperty: 'version' },
    message: "must have required property 'version'"
  },
  {
    instancePath: '/containers/cna/affected/0/versions/0',
    schemaPath: '#/properties/versions/items/oneOf/2/required',
    keyword: 'required',
    params: { missingProperty: 'version' },
    message: "must have required property 'version'"
  },
  {
    instancePath: '/containers/cna/affected/0/versions/0',
    schemaPath: '#/properties/versions/items/oneOf/3/required',
    keyword: 'required',
    params: { missingProperty: 'version' },
    message: "must have required property 'version'"
  },
  {
    instancePath: '/containers/cna/affected/0/versions/0',
    schemaPath: '#/properties/versions/items/oneOf',
    keyword: 'oneOf',
    params: { passingSchemas: null },
    message: 'must match exactly one schema in oneOf'
  },
  {
    instancePath: '/cveMetadata/state',
    schemaPath: '#/properties/state/enum',
    keyword: 'enum',
    params: { allowedValues: [Array] },
    message: 'must be equal to one of the allowed values'
  },
  {
    instancePath: '',
    schemaPath: '#/oneOf',
    keyword: 'oneOf',
    params: { passingSchemas: null },
    message: 'must match exactly one schema in oneOf'
  }
]
Summary: Validation FAILED for 1 out of 1 files!

Which made me think that the validation is failing to match a case on the versions section and hence looking into build.js. I could rebase this branch but it doesn't feel like that's an issue here.

@darakian
Copy link
Author

darakian commented Jul 17, 2025

I've gone ahead and address some of the trailing commas (line numbers would help for the others 🙇) as well as the asymmetry in parameter requirements. I believe we already discussed * and 0 back here.
#371 (comment)
My position has not changed and andy sums it up well; the goal is to be semver compliant and special cases break that.

The proposed schema introduces ambiguity such as:

         "versions": [
            {
              "version": "4.0",
              "status": "affected",
              "greaterThanOrEqual": "5.0",
              "versionType": "semver"
            }

I believe you meant to use semver-2.0.0 as the type there, but either way it's also possible to input invalid data with the current version types. Change greaterThanOrEqual: 5.0 to lessThan: 0.1 and you have the same problem. You can't guard against that with schema validation so, we would need cve services to enforce that if its desired. I know we talked about this in some of the QWG meetings, but it is also touched on a bit here
#371 (comment)
So, this is not ambiguity which is introduced, but rather inherited.

If you think cve services should provide range checking I'd love to work with you on building that out 👍

This means that non-semver strings such as "2.5" can occur in the changes array when semver-2.0.0 is used, e.g., this is accepted by the proposed schema:

Oh, good catch. For what its worth it looks to me like versions can already mismatch today too. eg. semver could be used as the type for affected with Custom in the changes (or vica versa). I can certainly make a semver-2.0.0 specific fix but maybe the changes array should be made more strict generally.

"In the base case both record producers and record consumers can simply ignore the new data type." There are important record consumer use cases that cannot ignore it. One example is the cve.org website. Its objective is to present all version information in a human-readable form.

So, I asked back here #371 (comment) if a reference implementation would be helpful and it seems like maybe it would be. I do wonder if the cve website could simply display the versions as string though as I believe that's how current versions are handled.


@alilleybrinker

@darakian, would there be a problem with splitting out the introduction of greaterThan and greaterThanOrEqual into a separate proposal, leaving this one to only add the new semver-2.0.0 version type?

I'm not in love with that, but I could be open to it. I'd like to get broader consensus before entertaining the idea.

@alilleybrinker
Copy link

Regarding having CVE Services check version bounds, since it's not possible within the schema constraints: in the Package URL proposal we've recently agreed that CVE Services would be responsible for validating Package URLs, since Package URL parsing is too complex to constrain in a regex inside the schema.

I think it's fine that some constraints end up in CVE Services when they can't be done in the schema.

@ElectricNroff
Copy link

ElectricNroff commented Jul 17, 2025

I believe you meant to use semver-2.0.0 as the type there, but either way it's also possible to input invalid data with the current version types. Change greaterThanOrEqual: 5.0 to lessThan: 0.1 and you have the same problem.

I had intentionally used semver (not semver-2.0.0) when writing:

         "versions": [
            {
              "version": "4.0",
              "status": "affected",
              "greaterThanOrEqual": "5.0",
              "versionType": "semver"
            }

but either one is valid in the proposed schema. This isn't an inherited problem. There are now two properties that have the same meaning in this context (version and greaterThanOrEqual) and they can have different values. I agree that it's slightly similar to other data-inconsistency problems that are inherited.

More importantly, a semver-2.0.0 data producer needs to be aware of:

  1. if there is no fixed version, then you should use greaterThanOrEqual with the value of the earliest affected version
  2. if there is a fixed version, then you should specify both the fixed version and the earliest affected version in the same array element. The fixed version goes in the lessThan field. The earliest affected version is also entered, just like before, except that you need to spell greaterThanOrEqual differently. You need to spell it version or else it won't work.
  3. if you know the last affected version (e.g., "lessThanOrEqual": "1.2") but don't know the earliest affected version, and want to capture the largest possible range, then you need to enter 0.0.0-0 because that is ordered before all other versions [THIS IS ALREADY ADDRESSED IN TODAY'S COMMITS]

I believe rule 2 is too ridiculous and we shouldn't ship a schema with that behavior, because the support costs would be too high.

Rule 3 had also been harmful to data integrity, because it conflates the concepts of "don't know" with "a version named 0.0.0-0 existed and was vulnerable."

@ElectricNroff
Copy link

Seems like the trailing commas can just be deleted.

To find the remaining trailing commas without local tools, one can use websites such as jsonlint.com

Invalid JSON!
Error: Parse error on line 386:
...                    ]                  
-----------------------^
Expecting 'STRING', 'NUMBER', 'NULL', 'TRUE', 'FALSE', '{', '[', got ']'

It only identifies one of the trailing commas at a time. I don't know how many remain (there's at least one).

@darakian
Copy link
Author

darakian commented Jul 18, 2025

I'll get to the rest of the trailing commas later today. Thanks for the tool 👍

More importantly, a semver-2.0.0 data producer needs to be aware of:

1. if there is no fixed version, then you should use `greaterThanOrEqual` with the value of the earliest affected version

2. if there is a fixed version, then you should specify both the fixed version and the earliest affected version in the same array element. The fixed version goes in the `lessThan` field. The earliest affected version is also entered, just like before, except that you need to spell `greaterThanOrEqual` differently. You need to spell it `version` or else it won't work.

3. if you know the last affected version (e.g., `"lessThanOrEqual": "1.2"`) but don't know the earliest affected version, and want to capture the largest possible range, then you need to enter `0.0.0-0` because that is ordered before all other versions [THIS IS ALREADY ADDRESSED IN TODAY'S COMMITS]

To address these point by point

  1. The schema cannot know of the existence of a fixed version for a piece of software and so cannot enforce behavior dependent on the existence of a fixed version. We can at most document best practice which I'm fine with doing.

  1. If I'm consuming a record to see if it applies to me my task is to do an intersection between the set of versions I use and the set of versions labeled as vulnerable/broken/bad. Intersecting my set 1.2.3 && 1.7.19, && 0.7.3 && 19.1.1 with >= 0.0.0, < 13.3.7 and with < 13.3.7 give the same results. This is an aesthetic difference. If you want to say that we don't allow for ranges which are unbounded from below then I'm ok with that. I believe we discussed this in a QWG meeting and chris commented on using 0.0.0 as a lower bound back here
    Add a formal semver 2.0.0 version type #371 (comment)
    This might be a case where enforcement/normalization can be done via cve services.

  1. I agree that the current construction is a bit awkward. How would you feel about a construction where lower bounds always use lessThan/lessThanOrEqual and upper bounds always use greaterThan/greaterThanOrEqual?

@ElectricNroff
Copy link

ElectricNroff commented Jul 31, 2025

For 'How would you feel about a construction where lower bounds always use lessThan/lessThanOrEqual and upper bounds always use greaterThan/greaterThanOrEqual?': I am opposed to this for the 5.x version series of the CVE Record Format.

Consumers today can use the GET /cve/:id API to retrieve one CVE Record. They have built processes that recognize three allowable cases related to an element of the versions array: either a version property exists and is the only affected version, a version property exists and is the beginning of a range, or there is defaultStatus without a versions array at all (meaning that the same status applies to all versions). If any new case is introduced, such as greaterThanOrEqual with neither version (inside the versions array) nor defaultStatus (outside the versions array), then the API is not backward compatible from the perspective of a caller of GET /cve/:id.

Writing a reference implementation of different behavior is not sufficient. If we change the behavior at some future point (in favor of greaterThanOrEqual or other new properties), then we need a communication plan that can effectively reach consumers, and a substantial period of time for consumers to adapt their use cases to new business logic. This would typically be announced as a substantial update, one with breaking changes for GET /cve/:id API consumers, etc.

I believe the correct approach to SemVer is along the lines of what I originally suggested in 2023 at #263 - that:

  1. Producers are not entitled to declare their version as (any type of) SemVer unless they comply with the intention of the CVE Record Format.
  2. Ability to do that should be considered a "loophole" in the schema, and can be addressed in a way that is similar to how other loopholes were addressed in https://github.com/CVEProject/cve-schema/releases/tag/v5.1.0
  3. There is not enough justification to support any differences in how versions are expressed between SemVer 1 and SemVer 2.0.0. Very few CVE Records have version numbers that are valid under SemVer 1 but not valid under SemVer 2.0.0. (Also, many of these are from producers who are using date-based version numbers that begin with a year - they are not following the SemVer semantics of major version, and should not be writing "semver" regardless of whether they have version strings that satisfy the regular expression.)
  4. Going forward, when a data item is intended to be a SemVer version number, it should be accepted only if it complies with SemVer 2.0.0. The minimal usage of version numbers that are valid SemVer 1 but not valid SemVer 2.0.0 can be labeled as "custom" without any significant impact to consumers.
  5. Finally, in the "when a data item is intended to be a SemVer version number" phrase, I mean that producers can continue to use 0 and * as they do today. These are not intended to be SemVer version numbers and are instead an alternate syntax (that happens to be used within the same fields such as version or lessThan). The alternate syntax can be abandoned in CVE Record Format 6.

I tried to extract every string from every current CVE Record that is intended to be a SemVer version number, and then I compared them to the SemVer 1 regular expression and to the SemVer 2.0.0 regular expression. The result was that 1.4% (see below) of these version numbers were valid for SemVer 1 but not valid for SemVer 2.0.0. A reasonable assessment is that SemVer 2.0.0 is sufficient for the community's needs, and should be what "semver" means in the CVE Record Format going forward.

We should not be considering a complex and controversial semver-2.0.0 proposal to address a 1.4% case.

01.01.2024
04.07.01
06.09.04
09.05.2025
1.0.06
1.0.5-0185
1.0.7-0298
1.01.1
1.1.1-0383
1.12.04
1.2.0-0525
1.4.4-0635
1.4.7-0687
1.6.001
1.7.08
1.8.02
12.15.01
12.4.05
17.012.30205
17.012.30229
19.02.2024
2.0.02
2.05.03
2.7.01
20.005.30514
20.02.2025
22.003.20281
22.01.02
22.02.1
22.03.8
23.003.20244
23.09.1
24.001.30123
24.002.20736
24.002.20991
24.003.20054
24.06.1
25.001.20428
25.001.20521
3.3.03
3.3.07
3.3.08
3.3.091
3.5.04
3.5.07
5.2.02
5.25.08
5.3.01
5.3.02
5.4.01
5.6.06
6.20.01
6.30.03
7.06.013
7.18.03
8.30.01

@ElectricNroff
Copy link

There was some discussion in the QWG today about a schema in which there was always a version property in each element of the versions array. I agree that that schema may exist somewhere, and I agree that that schema might resolve some of the differences of opinion. However, as far as I can tell, that is not the schema associated with the latest version of the pull request. The schema file from the latest version of the pull request is https://github.com/CVEProject/cve-schema/blob/46c5293235eaf6908410b0542c5f770f4c1e1abb/schema/CVE_Record_Format.json and has:

         "oneOf": [
                            {
                                "required": ["version", "status"],
                                "maxProperties": 2
                            },
                            {
                                "required": ["version", "status", "versionType"],
                                "maxProperties": 3
                            },
                            {
                                "required": ["status", "versionType", "lessThan"]
                            },
                            {
                                "required": ["status", "versionType", "lessThanOrEqual"]
                            },
                            {
                                "required": ["status", "versionType", "greaterThan"]
                            },
                            {
                                "required": ["status", "versionType", "greaterThanOrEqual"]
                            }
                        ],

With this schema, it is valid to write:

         "versions": [
            {
              "greaterThanOrEqual": "1.0.0",
              "status": "affected",
              "versionType": "semver"
            }
          ]

(i.e., no version property at all). Regardless of when your proposal is adopted by the group, it may be useful if everyone is talking about the same schema.

@darakian
Copy link
Author

darakian commented Jul 31, 2025

@ElectricNroff can we dig in on a concern I may not be fully understanding? Let's use the json you just posted

         "versions": [
            {
              "greaterThanOrEqual": "1.0.0",
              "status": "affected",
              "versionType": "semver"
            }
          ]

I see this as an encoding of the expression all versions greater than or equal to 1.0.0 and I don't see lack of the version property as a problem. So

  1. Do you think the version property should always be present even if its interpretation will vary versionType to versionType?
  2. If I changed the construction such that the version property would always be present would that suffice? Or put another way, what would you want to always be true of a versions block?

Give me a bit more time to parse your longer post above.

@ElectricNroff
Copy link

Yes, I want the version property to always be present even if its interpretation will vary versionType to versionType. To me, it is appealing that it always means "the point in the lifecycle of this thing at which the vulnerability was first present." Any consumer today may have a data-processing rule, or an alert system, or even a UI layout that focuses on that specific property. An exception, of course, is OmniBOR, where the thing is immutable, i.e., the sha256 hash itself doesn't have a lifecycle (and thus we currently only use defaultStatus and don't use a versions array at all).

Just having version always required, without no other redesign, isn't an effective solution. For changes within the CVE Record Format 5.x series, I would like to offer the behavior of "if you see an unrecognized data element, and simply ignore it, you are no worse off." For

"versions": [
            {
              "version": "1.0..0",
              "greaterThanOrEqual": "1.0.0",
              "status": "affected",
              "versionType": "semver-2.0.0"
            }
          ]

if you ignore greaterThanOrEqual then you conclude that exactly 1.0.0 (and nothing else) is affected. The consumer thus has a huge misinterpretation of the data, one that could realistically change the picture from "you are unlikely to be vulnerable" to "you are almost certainly vulnerable."

@darakian
Copy link
Author

darakian commented Aug 1, 2025

Yes, I want the version property to always be present even if its interpretation will vary versionType to versionType. To me, it is appealing that it always means "the point in the lifecycle of this thing at which the vulnerability was first present."

I think these two sentences are at odds with each other (at least as I read them). If the meaning of version can vary then one could construct another meaning than "first known vulnerable". I do agree that it's appealing to know a first known vulnerable version though and let me get back to you tomorrow with a concrete proposal for a different construction. We can discuss specifics from there 👍

@darakian
Copy link
Author

darakian commented Aug 1, 2025

For your larger post, I think we touched on most of those points in the QWG meeting yesterday and noted that the concern is less about the small diff from semver 1 to semver 2 and more about the general inconsistency. Also curious where you found the semver 1 regex. If you think there's a point in there I'm glossing over that I shouldn't be or that wasn't addressed synchronously please call it out.

For the new construction what do you think about this. We keep.

"oneOf": [
    {
        "required": ["version", "status"],
        "maxProperties": 2
    },
    {
        "required": ["version", "status", "versionType"],
        "maxProperties": 3
    },
    {
        "required": ["version", "status", "versionType", "lessThan"]
    },
    {
        "required": ["version", "status", "versionType", "lessThanOrEqual"]
    }
],

and in the case that versionType == semver-2.0.0 we require the semver regex validation on the
version, lessThan, and lessThanOrEqual
parameters. This lets us define
= x as

{
  "version": "x",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

>= x, < y as

{
  "version": "x",
  "lessThan": "y",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

>= x, <= y as

{
  "version": "x",
  "lessThanOrEqual": "y",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

and I'll propose two special cases which would otherwise be invalid ranges so that we can capture < x and <= x
> x as

{
  "version": "x",
  "lessThan": "0.0.0",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

and >= x as

{
  "version": "x",
  "lessThanOrEqual": "0.0.0",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

with the understanding that x != 0.0.0. This lets us represent the concept of "greater than (or equal to)" without introducing new parameters. We could also do 0.0.0-0 or whatever if you like, but some "single special case" so that its easy to write validation/tooling. Unbounded below ranges < x and <= x can be encoded with a lower bound manually set to 0.0.0.
That is
< x as

{
  "version": "0.0.0",
  "lessThan": "x",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

and
<= x as

{
  "version": "0.0.0",
  "lessThanOrEqual": "x",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

This gives us all of our normal mathematical range tools aside from a non-inclusive lower bound
(x, y) / > x, < y
and
(x, y] / > x, <= y
but maybe we can live without that.

With this construction the version parameter is always present in any expression of a semver-2.0.0 range and version is always a lower bound (or only bound). In the case where it is a non-inclusive lower bound it does not imply that it is the "first known vulnerable" but rather the "last known good". In the case where version stands alone (= x) it also doesn't imply "first". If this approach is amicable to you (and the group as a whole) I'll update the tests, docs, etc... in another commit.

@ElectricNroff
Copy link

I don't think that typical SemVer implementations would consider it invalid to check whether an observed version number is less than 0.0.0. Consequently, they have no innate knowledge that a range that ends in 0.0.0 is invalid. They would just do the math as defined by the SemVer specification, and conclude that any observed version is simply not inside the range,

Therefore, this is a breaking change because all such code would need to be changed.

Here is one example of SemVer comparison within an Open Source vulnerability scanning product:
https://github.com/wazuh/wazuh/blob/4330cc0020bdcfd19e2f6605a5b2e3886de66b7a/src/wazuh_modules/vulnerability_scanner/src/scanOrchestrator/versionMatcher/versionObjectSemVer.hpp#L117-L170

@darakian
Copy link
Author

darakian commented Aug 7, 2025

Indeed it is atypical. It was designed to meet your requirement of the version property being ever present. It was never going to be typical and I'm fairly certain that was well understood. Just to state it, you had asserted that you would be happy with a new construction that always included the version property even if the meaning of it changed
Please see: #371 (comment)

Consequently, they have no innate knowledge that a range that ends in 0.0.0 is invalid.

They have no innate knowledge of anything today. Please see: #362

Therefore, this is a breaking change because all such code would need to be changed.

We have yet to define what is and is not a breaking change 👍. Please see #418

@darakian
Copy link
Author

darakian commented Aug 7, 2025

At the request of the QWG meeting today here are the other two constructions. For ease of readability I'll be breaking these into two posts.

The first which was first introduced here e637776 and which was designed to be completely new so that existing parsers would be the least likely to misinterpret the new data. Discussed back around this comment #371 (comment)
Also, to state it clearly the code I shared in that comment should be considered public domain.

Five new properties are introduced.
inclusiveLowerBound, exclusiveLowerBound, inclusiveUpperBound, exclusiveUpperBound, and exactly

Which would allow the construction of

The singleton

= x as

{
  "exactly": "x",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

Two sided ranges

>= x, < y as

{
  "inclusiveLowerBound": "x",
  "exclusiveUpperBound": "y",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

>= x, <= y as

{
  "inclusiveLowerBound": "x",
  "inclusiveUpperBound": "y",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

> x, < y as

{
  "exclusiveLowerBound": "x",
  "exclusiveUpperBound": "y",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

> x, <= y as

{
  "exclusiveLowerBound": "x",
  "inclusiveUpperBound": "y",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

One sided ranges

> x as

{
  "exclusiveLowerBound": "x",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

>= x as

{
  "inclusiveLowerBound": "x",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

< x as

{
  "exclusiveUpperBound": "x",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

<= x as

{
  "inclusiveUpperBound": "x",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

The design is primarily for machines, but I think the wording choice also makes it easy for an uninitiated human with a basic mathematics education to understand the raw data in a pinch. The use of completely new properties is to avoid any interpretation conflict with existing parsers. The choice of breaking out = into a distinct case simplifies parser logic over trying to reuse properties with different interpretation logic dependent on surrounding properties. Otherwise I think this is the simplest and most straight forward choice. It was (and still is) my first choice :)

@darakian
Copy link
Author

darakian commented Aug 7, 2025

The second choice which was introduced in a72e5b8 to "avoid bloat" by request
See: #371 (comment)

Two new properties are introduced.
greaterThan, and greaterThanOrEqual which are to be used in conjunction with the prior three (version, lessThan, lessThanOrEqual)

Which are then used in expressions as
= x as

{
  "version": "x",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

Two sided ranges

>= x, < y as

{
  "version": "x",
  "lessThan": "y",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

>= x, <= y as

{
  "version": "x",
  "lessThanOrEqual": "y",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

Two sided ranges with exclusive lower bounds were not implemented and it's unclear how to cleanly implement them with the restrictions that were imposed on this implementation. One could consider something like
> x, < y

{
  "greaterThan": "x",
  "lessThan": "x",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

and omit version. However, the constraint was that version must be present. If there are two special cases that allow it not to be present then why not always allow it. If so this becomes a name swap with the original proposal.

One sided ranges

> x as

{
  "greaterThan": "x",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

>= x as

{
  "greaterThanOrEqual": "x",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

< x as

{
  "lessThan": "x",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

<= x as

{
  "lessThanOrEqual": "x",
  "status": "affected",
  "versionType": "semver-2.0.0"
}

In retrospect I view this construction as something of a halfway house and as such is my least favorable option. The third construction here #371 (comment)
encodes the same ranges without the need for two new properties is a tighter construction imo. I really should have thought of it back in april. C'est la vie.

@andrewpollock
Copy link

The third construction here #371 (comment)
encodes the same ranges without the need for two new properties is a tighter construction

So in the interests of having current state at the end of this very long comment trail, what does that mean for the path forward?

@darakian
Copy link
Author

@andrewpollock That's a question for the QWG chair's @david-waltermire, @ccoffin, @MrMegaZone and potentially the board.

I see Dave thumbs up'd the original construction
#371 (comment)
which is my preference, but I'd rather not move forward rewriting/reverting this whole thing until I have clarity on which approach (if any) is acceptable.

@rjb4standards
Copy link

How would I know which specific versions are between 2.0.0 and 2.5.7 using this approach?

{
"greaterThan": "x",
"lessThan": "x",
"status": "affected",
"versionType": "semver-2.0.0"
}

@darakian
Copy link
Author

How would I know which specific versions are between 2.0.0 and 2.5.7 using this approach?

They would be the versions V which when evaluated with the semver ordering/precedence rules
https://semver.org/#spec-item-11
Would be greater than 2.0.0 and lower than 2.5.7
Or V ∈ (2.0.0, 2.5.7)

@rjb4standards
Copy link

Got it. So the software producer will know which versions are in the set, but how would a consumer know which versions are in the set.

@darakian
Copy link
Author

darakian commented Aug 18, 2025

They would check for whichever version(s) of interest are relevant.

Edit: If the question you're asking is more along the lines of How does one represent a list of discrete versions
then that would be done with multiple version statements. The specific word choice would depend on which construction is chosen but a pseudo example could be

{
"exactly": "x1",
"status": "affected",
"versionType": "semver-2.0.0"
}
{
"exactly": "x2",
"status": "affected",
"versionType": "semver-2.0.0"
}
{
"exactly": "x3",
"status": "affected",
"versionType": "semver-2.0.0"
}
....

@rjb4standards
Copy link

rjb4standards commented Aug 18, 2025

Got it, so if they are running version 2.3.4-beta this would be in the set because it's between 2.0.0 and 2.5.7
Do you know if/when NIST NVD will support this VERS range in their search API?

@darakian
Copy link
Author

Right. By the rules of semver 2.0.0 < 2.3.4-beta < 2.5.7 and so your version is affected (or unaffected if the record says unaffected instead).

Do you know if/when NIST NVD will support this VERS range in their search API?

after this PR merges at very least. I have no idea if this is even on their radar though to be honest. If there's a tooling related question
https://github.com/CVEProject/automation-working-group
is likely a better place for it.

@alilleybrinker
Copy link

@rjb4standards this is distinct from NIST's NVD search API. The Record Format (what we're discussing here), is managed by the CVE project. NVD, maintained by NIST, is a downstream consumer of CVE data. So even if/when this proposal for a new version type is added to CVE, it'll be up to NVD what to do about it.

@rjb4standards
Copy link

@alilleybrinker Thanks for clarifying. Does this mean the CVE Foundation will have a searchable API that supports the sem version range? For example, show me all the CVE's for ACME 2.3.4-beta?

@alilleybrinker
Copy link

@rjb4standards the CVE Foundation is different from the CVE Project. As for what the CVE Project would do for improving search, I recommend talking to the Automation Working Group and/or the Consumer Working Group. You can find out more about the groups here: https://www.cve.org/ProgramOrganization/WorkingGroups

@rjb4standards
Copy link

@alilleybrinker thanks for clarifying. The CVE space is getting very confusing with the looming funding deadline coming fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

8 participants